Beginners mind (Shoshin) denotes openness, eagerness and lack of preconceptions when studying a subject, just as a beginner would, no matter what level of expertise the student has.
Even black belt martial artists practice basic techniques like blocks and punches every time they train.
This session doesn’t assume any prior knowledge of R, and introduces the basics. For some this will be revision from last year, but we provide additional material for advanced students test their knowledge and extend familiar skills.
General principles
- Reproducibility and transparency in science (as a motivation for using R)
- Precision and attention to detail as an important skill.
R techniques covered
- Using the RStudio interface
- Working interactively in R Markdown
- Creating a chunk
- Loading packages
- built-in ‘dataframes’ and tibbles (mtcars, diamonds); other datasets inside packages (gapminder)
glimpse,headand clicking on the Environment window to look at data- the pipe
%>% - Introduce
ggplot; demonstrategeom_pointandgeom_boxplot - highlight types of data shown in
glimpse(dbl, ord) and show the problem in a continuous color scale
Using the RStudio interface
These worksheets assume that you are using a web browser to access the RStudio Server at Plymouth University.
NOTE: RStudio works on most web browsers (e.g. Firefox, Safari, Chrome) but does not work that well on the default web browser in Windows 10 (“Edge”). If you’re using Windows, we recommend downloading Firefox and using that. Firefox is free and open source.
When you login to RStudio, you’ll be greeted with a screen that looks something like the image below. (If you’ve used RStudio before, you will see some additional folders and files.)
RStudio on first opening
When you open RStudio for the first time, you can see three parts:
The Console - This is the large rectangle on the left. This is where you tell R what to do, and it’s also where R prints the answers to your questions.
The Environment - This is the rectangle on the top right. This is where R keeps a list of the data it knows about. It’s empty at the moment, because we haven’t given R any data yet.
The Files - This is the rectangle on the bottom right. This is a bit like the File Explorer in Windows, or the Finder on a Mac. It shows you what files and folders R can see.
You should also be able to see that the two rectangles on the right have a number of other “tabs.” These work like tabs on a web browser.
The top rectangle has the tabs “Environment” and “History.” The History tab keeps a record of commands you’ve recently typed into the Console. This can sometimes be useful.
The bottom rectangle has the tabs “Files,” “Plots,” “Packages,” “Help,” and “Viewer.” We’ll cover what these other tabs do later on.
Before you start
Before starting this module, you need to run an R command which makes a folder and downloads the files you will need for each workshop.
- Click on the Console pane
- Copy-paste the following command into the console
source('https://raw.githubusercontent.com/benwhalley/lifesavR/main/bootstrap.R')
Your console should now look like this:
Press return (enter) to run the command. If your console looks like the image below, then you are ready to start the session.
Working interactively in R Markdown
Click on the lifesavr folder in the Files pane. Notice than some files have the extension .rmd. These are R Markdown files. It is important that any R Markdown file you create has the extension .rmd (or .Rmd), because this is how RStudio knows what they contain.
R Markdown is a way of combining R with natural language. It allows you to integrate the results of your data analysis into high quality reports, research papers, dissertations or books. Because it’s such a powerful tool, this module provides an early, gentle introduction to R Markdown.
RStudio needs to distinguish R code from narrative text. This is done by putting the code inside some special characters, creating what’s referred to as a chunk. A chunk is opened using the symbols ```{r}, and closed using the symbols ```. This is what a chunk looks like in RStudio (this chunk has been given the optional name life):
A code chunk in the RMarkdown editor
NOTE: The symbols which start and end a chunk are backticks, not single quotes.
On windows
On a Mac
Running R code within a chunk
Watch the following short video to see how to run code within a chunk.
In each session you will work in a single R Markdown file.
Click on session-1.rmd, which is the file you need for this session.
As the video shows, one way to run code within a chunk is to execute the commands one at a time, using the following keys:
Windows, Linux: Ctrl + ↵
Mac: ⌘ + ↩
- Locate the first chunk in
session-1.rmd - Place your cursor (anywhere) on the line that says
library(tidyverse) - Run this line by pressing the keys above
You will see some output appear in the console. Don’t worry about the details for now, we’ll explain those later. However, one of the effects of the command you have just run is to load some data about diamonds.
Now position your cursor on the line that says diamonds and run the commands.
You should see the a scatterplot of the diamonds data appear below the chunk:
Congratulations! You have just run your first lines of R. The code to produce the plot consisted of three lines. You can also run part of a line by highlighting the code you want to run:
- Select (highlight) the word
diamonds - Run the code
This prints the first few lines of the diamonds data:
| carat | cut | color | clarity | depth | table | price | x | y | z |
|---|---|---|---|---|---|---|---|---|---|
| 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
| 0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
| 0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
| 0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.2 | 4.23 | 2.63 |
| 0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |
| 0.24 | Very Good | J | VVS2 | 62.8 | 57 | 336 | 3.94 | 3.96 | 2.48 |
| 0.24 | Very Good | I | VVS1 | 62.3 | 57 | 336 | 3.95 | 3.98 | 2.47 |
| 0.26 | Very Good | H | SI1 | 61.9 | 55 | 337 | 4.07 | 4.11 | 2.53 |
| 0.22 | Fair | E | VS2 | 65.1 | 61 | 337 | 3.87 | 3.78 | 2.49 |
| 0.23 | Very Good | H | VS1 | 59.4 | 61 | 338 | 4 | 4.05 | 2.39 |
Why would you want to run part of a line of code? In these workshops you will combine simple steps into sequences which do a particular job, such as generating a plot. It’s natural, especially when you’re new to R, that the full sequence of commands won’t do exactly what you want first time. Running part of your code allows you to identify the steps which are correct. This allows you to modify subsequent steps until your code produces the required results. Remember this technique as you will be using it extensively in these workshops.
Variables
It is important to understand what a variable is in R. A variable is a name which can be assigned a value using the assignment operator: <-.
Run the lines in the chunk named life.
The results should look similar to this:
This code stores the results of the addition 40 + 2. Line 1 assigns the value 42 to the variable meaningoflife. The assignment operator <- looks like an arrow that points to the left. This is a reminder that the results of the calculation on the right hand side will be assigned to the variable on the left hand side. Line 2 displays the value of meaningoflife. Line 3 is the output of the life chunk.
Variables that you create are stored in what’s called the Global Environment. You can see them in the Environment pane.
Creating a chunk
You will be creating many chunks, so learn the following keyboard shortcuts for inserting the opening and closing characters:
Windows, Linux: Ctrl + Alt + I
Mac: ⌘ + T
Exercise 1
- Use the keyboard shortcut to create a chunk
- Inside the chunk, create a variable called
myfirstvariablewhich stores the result of the addition2 + 2 - Run the chunk
After completing these steps, your environment should look like this:
Environment after Exercise 1
Loading packages
- loading packages, especially
tidyverse, before anything else
Built in data
- explain datasets
- explain that a data.frame is same as a tibble
- show mtcars, diamonds (by show I just mean type name and run it to see data in rmd window interactively)
- show that you can click around in this (e.g. to see all columns and all rows - only 10 shown on first page)
mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2- show other datasets inside packages: gapminder
Exploring and checking data
glimpse,headand clicking on the Environment window to look at data- highlight types of data shown in glimpse (dbl, ord) (we will come back to this below)
There is a menu which allows you to view your variables as a list or a grid:
Environment with list/grid menu
Grid view shows you the type in the Type column.
ggplot and the pipe
Scatter plots
the pipe
%>%sends data to the next bit of codeIntroduce/recall
ggplot- demonstrate
geom_pointand geom_jitter- TODO find an example dataset which requires geom_jitter because on integer scale
- demonstrate
# TODO find better example for this
attitude %>%
ggplot(aes(rating, complaints)) +
geom_point()
# vs.
attitude %>%
ggplot(aes(rating, complaints)) +
geom_jitter()Boxplots
gapminder::gapminder %>%
ggplot(aes(continent, lifeExp)) +
geom_boxplot()Problems with x axes
Show this and highlight it’s not what we expect
mtcars %>%
ggplot(aes(am, mpg)) +
geom_boxplot()
Warning: Continuous x aesthetic -- did you forget aes(group=...)?The reason is visible here:
mtcars %>% glimpse()
Rows: 32
Columns: 11
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…The am has type dbl…
We must tell R it’s a factor:
mtcars %>%
ggplot(aes(factor(am), mpg)) +
geom_boxplot()Adding color. This works:
diamonds %>%
ggplot(aes(carat, price, colour = clarity)) +
geom_point()This doesn’t so well:
mtcars %>%
ggplot(aes(wt, mpg, color = cyl)) +
geom_point()We can improve it like this:
mtcars %>%
ggplot(aes(wt, mpg, color = factor(cyl))) +
geom_point()Check your knowledge
- What is
mtcars? - Explain what
glimpsedoes - What is the
%>%symbol called and what does it do? - What is the
<-symbol called and what does it do? - What is the difference between a
dbland anord/fct? - Give an example of when the difference between
dblandfctmatters when making a plot - How can you convert a variable from a
dblto afct - What is the difference between
geom_jitter()andgeom_point()? - Why is
geom_jitteruseful sometimes?
Extensions
- Lots more practice plots with different datasets?
- Better plotting worksheet stage 4?